home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
ftp.cs.arizona.edu
/
ftp.cs.arizona.edu.tar
/
ftp.cs.arizona.edu
/
icon
/
newsgrp
/
group97a.txt
/
000003_icon-group-sender _Fri Jan 3 09:28:05 1997.msg
< prev
next >
Wrap
Internet Message Format
|
2000-09-20
|
4KB
Received: by cheltenham.cs.arizona.edu; Fri, 3 Jan 1997 17:23:33 MST
Date: Fri, 3 Jan 1997 09:28:05 -0600
Message-Id: <97010309280532@cowboy.biomed.com>
From: graham@cowboy.biomed.com (Steve Graham, extension 5224)
To: icon-group@cs.arizona.edu
Subject: Re: Help for an Icon Neophyte
X-Vms-To: SMTP%"icon-group@cs.arizona.edu"
X-Vms-Cc: GRAHAM
Errors-To: icon-group-errors@cs.arizona.edu
Status: RO
Content-Length: 3430
Stuart Robinson wrote:
>> Hello, there.
>>
>> In my last posting, I put out an APB on Corre's book, Icon Programming
>> for Humanists, and received many helpful responses. Thanks. I finally
>> managed to lay my hands on the book.
>>
>> Now that I have started teaching myself Icon, I sometimes find that I
>> reach dead-ends, and there is no one around to steer me in the right
>> direction. So, I thought that I would post problem programs to this
>> newsgroup and hope that some kind soul with more knowledge of Icon would
>> step in and lend a hand. I hope this is appropriate.
>>
>> I am trying to write a program to test Zipf's Law. Zipf was a linguist
>> who observed that when you take the frequency of the words in a text and
>> rank them from most to least frequent that you can then create a log-log
>> plot of frequency against rank and obtain roughly a straight line.
>>
>> The program that I have created thus far is taken almost directly from
>> The Icon Programming Language. Here it is:
>>
>> procedure main()
>>
>> num := 0
>>
>> wlist := sort(countwords(), 4)
>> while num +:= 1 do {
>> write(left(num||" "||get(wlist), 12), right(get(wlist), 4))
>> }
>> end
>>
>> procedure countwords()
>>
>> wordcount := table(0)
>>
>> while line := read() do
>> line ? {
>> while tab(upto(&letters)) do
>> wordcount[tab(many(&letters))] +:= 1
>> }
>>
>> return wordcount
>>
>> end
>>
>> I would like to the program to take a text, convert all of the letters
>> to lowercase (so that "Which" and "which" are not counted as separate
>> words), create a table of all of the words along with their associated
>> frequencies, and to print out the results in ranked fashion. For
>> example, given the following text
>>
>> Which is the dog which can outrace the cars?
>>
>> I would like output more or less as follows (please ignore formatting--I
>> will eventually use entab() for ease of exportation into a program such
>> as Excel):
>>
>> 1 the 2
>> 2 which 2
>> 3 can 1
>> 4 cars 1
>> 5 dog 1
>> 6 is 1
>> 7 outrace 1
>>
>> I have run into the following problems. First, I was unsuccessful at
>> mapping lowercase letters on to uppercase letters. I assume that I want
>> to use something like map(line, &ucase, &lcase), but where should I
>> insert it? And should the csets &ucase and &lcase be bracketted by
>> single quotations marks? Second, the program seems to hang up at the
>> end. I have no idea why that's happening.
>>
>> Eventually I would like to have the program do the log-log plot, but for
>> now I can use the resulting output for exportation.
>>
>> Thanks in advance for any help you can provide.
>>
>> Stuart Robinson
>> srobinso@reed.edu, robinstu@ohsu.edu
>>
To fix your problem with lowercase, change the following:
while line := read() do
line ? {
while tab(upto(&letters)) do
wordcount[tab(many(&letters))] +:= 1
}
to:
while (line := read()) do {
line ? {
while tab(upto(&letters)) do {
word := map(tab(many(&letters)),&ucase,&lcase)
wordcount[word] +:= 1
}
}
}
I believe that this will also fix the hang up problem.
Steve Graham
graham@cowboy.biomed.com